-
Notifications
You must be signed in to change notification settings - Fork 77
[api][runtime] Introduce long-term memory in python #332
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
e85caea to
7c14985
Compare
| SUMMARIZE = "summarize" | ||
|
|
||
|
|
||
| class ReduceSetup(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest to name this CompactionStrategy, and make it an abstract class that we can provide different implementations, so we can have strict limit on which arguments should be specified for each strategy. We can call the current ReduceStrategy CompactionStrategyType.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think CompactionStrategy.trim(n) might be more straightforward for users, compared to ReduceSetup.trim_setup(n).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- +1 on that
Compactionis a better terminology. ThisCompactionterm is commonly used in many open source software in the industry today. - Should we consider using noun as the name of each strategy. For example,
summarize->summarization,trim->truncation?
| if memory_set.size >= memory_set.capacity: | ||
| # trigger reduce operation to manage memory set size. | ||
| self._reduce(memory_set) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This can be extremely slow. We should proactively do the compaction.
| self.client.delete_collection(name=name) | ||
|
|
||
| @override | ||
| def add(self, memory_set: MemorySet, memory_item: str | ChatMessage) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had a feeling that adding items to long-term memory can take time, for embedding. We probably should also provide async apis.
| return self.slice(memory_set=memory_set, offset=offset, n=n) | ||
|
|
||
| @override | ||
| def search( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same here.
|
Hi, @alnzng. There's a design issue related to the vector store that I'd appreciate your help reviewing. As describe in the design doc #339, long-term memory of flink-agents is also based on vector store. Currently, I provide an implementation based on chroma. In this implementation, I directly use chroma client rather than flink-agents @xintongsong believes that we can directly build upon the Flink-Agents I think this make sense, but it requires add some interfaces to These interface may not be achievable for each vector store, I will conduct research and refinement afterward. |
Thanks for raising this design question @wenjin272! I agree with @xintongsong that building long-term memory on top of The current A few thoughts on the proposed methods:
However, I have concerns about including collection management in the base interface. This concept is vector My suggestion: Keep collection management implementation specific rather than in the base interface. Each vector store integration can expose collection management using its native terminology and patterns. This approach aligns with how
|
| SUMMARIZE = "summarize" | ||
|
|
||
|
|
||
| class ReduceSetup(BaseModel): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- +1 on that
Compactionis a better terminology. ThisCompactionterm is commonly used in many open source software in the industry today. - Should we consider using noun as the name of each strategy. For example,
summarize->summarization,trim->truncation?
| # | ||
|
|
||
| path=$1 | ||
| chroma run --path $path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
is this script must to have?
IIUC, You can just add "import chromadb" in the test class which will "start" the chroma automatically. I did this in test_chroma_vector_store.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
IIUC, the import chromadb will start the chroma in In-Memory mode. For long-term memory, we recommend Server-Client mode or Cloud mode for data persistence, so I start a chroma server here.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To enable PersistentClient[1], specify persist_directory parameter while building the ChromaVectorStore should help. IIUC, this is similar to what "--path $path" parameter offer in chroma CLI.
[1] https://docs.trychroma.com/docs/run-chroma/persistent-client
Thanks for your suggestion @alnzng. The reason I want to provide collection level operation in I have reviewed the vector store design in langChain, it indeed doesn't provide collection management in base interface. But if we keep collection management implementation specific rather than in the base interface, the I think all the vector store provide concept like |
|
Maybe we can have two separate interfaces, something like:
|
@wenjin272 This was my major concern part since I was not able to verify all existing vector stores and we can't predict for new future vector stores. For example, Pinecone uses namespace + index to manage its data, namespace which seems like a "Collection". However it doesn't seem to support namespace(medata) update operation, which means below method won't work with Pinecone: https://docs.pinecone.io/reference/api/2025-10/data-plane/createnamespace @xintongsong suggestion on adding a new interface for indicating collection support seems to be a safe solution. |
94bee98 to
c80d570
Compare
|
Hi, @alnzng, I add some interfaces to The async execution for |
| """ | ||
|
|
||
| @abstractmethod | ||
| def add_embedding( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
add_embedding and query_embedding are not meant for users to call. We probably should mark them as protected.
| class Collection(BaseModel): | ||
| """Represents a collection of documents.""" | ||
| name: str | ||
| size: int |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is this size used for and how do we keep it consistent?
| The deleted collection | ||
| """ | ||
|
|
||
| def maybe_cast_to_list(value: Any | List[Any]) -> List[Any] | None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should this be private?
| class LongTermMemoryBackend(Enum): | ||
| """Backend for Long-Term Memory.""" | ||
|
|
||
| VectorStore = "vectorstore" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| VectorStore = "vectorstore" | |
| EXTERNAL_VECTOR_STORE = "external_vector_store" |
| item_type: Type[str] | Type[ChatMessage] | ||
| capacity: int | ||
| compaction_strategy: CompactionStrategy | ||
| size: int = Field(default=0, exclude=True) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It's fragile to maintain the size separately from the underlying actual collection. We might consider get the size from the store.
| from flink_agents.api.prompts.prompt import Prompt | ||
|
|
||
|
|
||
| def summarize( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It doesn't make sense to always compact the entire long term memory into 1 message. I think we should only merge similar messages, and discard meaning less one. As a result, we should get a smaller set of messages with higher information density.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It might require more efforts than we can take in this PR to tune the compaction strategy in order to get an ideal performance. But at least in the abstraction we should not assume this only returns one message.
| """ | ||
|
|
||
| @abstractmethod | ||
| def delete_memory_set(self, name: str) -> MemorySet: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It probably make sense to just return a boolean. The memory set should be no longer bond with the underlying store.
| # | ||
|
|
||
| path=$1 | ||
| chroma run --path $path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To enable PersistentClient[1], specify persist_directory parameter while building the ChromaVectorStore should help. IIUC, this is similar to what "--path $path" parameter offer in chroma CLI.
[1] https://docs.trychroma.com/docs/run-chroma/persistent-client
| ) | ||
| id: str | None = Field( | ||
| default=None, description="Unique identifier of the document." | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Out of curiosity, did we change the code format stype? seems there are many diffs like this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I just run ruff format before commit the codes. Looks like ruff check only checks the lint rule, while ruff format only format the code style.
| """ | ||
|
|
||
| def add( | ||
| self, documents: Document | List[Document], collection_name: str | None = None, **kwargs: Any |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The newly added CRUD methods look flexible enough to handle operations across multiple collections, which is good. However, the current vector store implementation is limited to a single collection. To keep the behavior consistent, we should update the existing implementation to support multiple collections as well. For example, For example, the queryimplementation needs to be extended to support retrieval from multiple collections to align with the new CRUD methods.
| ) | ||
|
|
||
| # Generate embeddings for each document | ||
| embeddings = [embedding_model.embed(doc.content) for doc in documents] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Chroma seems to have a size limitation on each request when persisting embeddings, batches should contain fewer than 41,666 embeddings. This has been discussed in the Chroma community (see GitHub issue #1049). I’m not sure if this restriction has changed recently, but to be safe we should add a check for requests exceeding 41,666 embeddings, similar to how LlamaIndex handles it today: https://github.com/run-llama/llama_index/blob/main/llama-index-integrations/vector_stores/llama-index-vector-stores-chroma/llama_index/vector_stores/chroma/base.py#L97C1-L97C72
That being said, it may be better to abstract the embedding function at the document level and allow each vector store implementation to override it as needed.
…de a vector store based implementation.
address comments. fix
…d provide a vector store based implementation. address comments.
9107487 to
8368345
Compare
xintongsong
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM.
@alnzng, would you like to take another look as well?
| response: ChatMessage = _generate_summarization( | ||
| items, memory_set.item_type, strategy, ctx | ||
| ) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The amount of memory to be compacted can be large and may exceed the context window of the model. We should take that into consideration. Let's add a todo here and create a follow-up issue.
Linked issue: #331
Purpose of change
Introduce the long-term memory interface in python, and provide an implementation based on chroma.
This is the first pr of three to introduce long-term memory in python:
Tests
Unit test
API
Yes, add long-term memory related api.
Documentation
doc-needed